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RESEARCH REPORT 

Assessing Intercultural Competence in Higher Education: 
Existing Research and Future Directions 

Richard L. Griffith , 1 Leah Wolfeld , 1 Brigitte K. Armon , 1 Joseph Rios , 2 & Ou Lydia Liu 2 

1 Institute for Cross Cultural Management, Florida Institute of Technology, Melbourne, FL 

2 Educational Testing Service, Princeton, NJ 


The modern wave of globalization has created a demand for increased intercultural competence (ICC) in college graduates who will 
soon enter the 21st-century workforce. Despite the wide attention to the concepts and assessment of ICC, few assessments meet the 
standards for a next-generation assessment in areas of construct clarity, innovative item types, response processes, and validity evidence. 
The objectives of this report are to identify current conceptualizations of ICC, review existing assessments and their validity evidence, 
propose a new framework for a next-generation ICC assessment, and discuss key assessment considerations. To summarize, we found 
the current state of the literature to be murky in terms of the clarity of the ICC construct. Definitions of the construct vary considerably 
as to whether it is a trait, skill, or performance outcome. In addition, current measurements of ICC overly rely on self-report methods, 
which have a number of flaws that result in less than optimal assessment. In this paper, we propose a new framework based on a 
model of the social thinking process developed by Grossman and colleagues that describes the knowledge, skills, and abilities that 
promote success in complex social situations. From this social process model, as well as Earley and Peterson’s definition of ICC (a 
person’s capability to gather, interpret, and act upon these radically different cues to function effectively across cultural settings or in 
a multicultural situation), three stages are developed: approach, analyze, and act. Guided by this framework, we discuss assessment 
considerations such as innovative task types and multiple response formats to help translate the framework to an assessment of ICC. 

Keywords Intercultural competence; measurement; cross-cultural competence; global competence; international higher education 

doi:10.1002/ets2.12112 


The modern wave of globalization, having long overtaken the business sector, economics, technology, and transporta¬ 
tion, has come to higher education. To compete in the global arena—and, therefore, solicit international student revenue, 
attract high-potential students, and produce effective university ambassadors for increased brand recognition—university 
administrators must demonstrate that their institution prepares graduates appropriately for the global workforce. In the 
last 8 years, the United States witnessed a 56% increase of international students studying in higher education institu¬ 
tions, resulting in 886,052 additional students for the 2013-2014 school year, which generated 30.5 billion dollars for 
the U.S. economy (Institute of International Education, 2015) and created 373,000 jobs (NAFSA: Association of Interna¬ 
tional Educators, 2016). For years, prestigious programs such as the Fulbright Program have been sending students and 
scholars around the world to higher education institutions to facilitate mutual understanding across countries (Bureau 
of Educational and Cultural Affairs, 2013). Further, 273,996 U.S. students enrolled in higher education studied abroad in 
the 2012-2013 academic year (Institute of International Education, 2015). Thus, increased internationalization in higher 
education institutions alone demands that university students develop intercultural competence (ICC) in order to interact 
successfully with diverse peers and professors and maximize their collegiate experience. 

Being able to communicate and work effectively across cultures has also been identified as a desirable capability by 
various organizations with global missions (Bikson, Treverton, Moini, & Lindstrom, 2003) and even more important to 
potential employers than an undergraduate major; in fact, 78% of surveyed employers stressed the importance of all stu¬ 
dents gaining intercultural skills (Hart Research Associates, 2015). Unsurprisingly, ICC has been identified as an essential 
student learning outcome in higher education (Association of American Colleges and Universities, 2011). Accordingly, 
higher education institutions in the United States and abroad are increasingly concerned with preparing students to be 
competitive contributors in the global economy as well as remaining competitive in regard to international education 
and other internationalization efforts (e.g., exchange programs, study abroad experiences, and marketing targeted toward 
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international students; De Haan, 2014; Scott, 2000). If higher education institutions are to remain relevant, they must take 
charge of their internationalization and produce graduates who will excel in the global work arena (e.g.. Fellows, Goedde, 
& Schwichtenberg, 2014). Meeting the challenge of producing culturally competent graduates requires the tracking of 
student development of ICC; however, the existing challenges of measuring ICC complicate tracking initiatives. 

Although some higher education institutions recognize the importance of measuring their students’ ICC, this recogni¬ 
tion has only recently expanded beyond assessing study abroad programs. For instance, the Fund for the Improvement of 
Postsecondary Education (FIPSE) program, through the U.S. Department of Education, has developed an international 
learning outcomes ranking document to help institutions prioritize and assess components of ICC. (Its website may be 
found at http://www2.ed.gov/about/offices/list/ope/fipse/index.html). Another initiative, At Home in the World: Edu¬ 
cating for Global Connections and Local Commitments (AHITW), sponsored by the American Council on Education 
(ACE), highlights the need to include assessment as part of developing student and institutional ICC (ACE, 2016). Thus, 
the awareness of the benefits of higher education institutions assessing ICC among all students, not just those who partic¬ 
ipate in study abroad or exchange programs, is spreading. However, as will be discussed in detail in this report, many of 
the measures available to university administrators are self-report measures, some with inadequate evidence of reliability 
and validity. 

Given that higher education institutions have identified ICC to be a valuable student outcome and a marketable indica¬ 
tor of student and overall institutional success, it is imperative to develop valid and reliable measures of ICC in the context 
of higher education. Such an initiative would facilitate assessment of two areas: the capability of institutions to graduate 
interculturally competent students and the quality of various educational experiences in terms of student development. 
The purpose of this report is to explore the possibility and utility of assessing ICC for students in higher education. To this 
end, we review current definitions, existing assessments, and challenges for measuring this multidimensional construct. 
We then propose a theoretical model of ICC to guide the design of an assessment that captures the complexity of the con¬ 
struct while avoiding its common measurement pitfalls. After describing the model, we then describe several measurement 
considerations, including task type, response format, and the need for more advanced assessment techniques. 

Current State of Assessments, Research, and Challenges 
Definitions of Intercultural Competence in Higher Education 

A review of the literature (see Appendix for a description of the literature search process) revealed a multitude of defini¬ 
tions of ICC. The ICC definitions (Table 1) used in the higher education literature tend to be associated with models used 
in education, training, and research. These models fall into five categories: compositional, co-orientational, developmen¬ 
tal, adaptational, and causal (Spitzberg & Changnon, 2009). Compositional models (e.g., Deardorff, 2006; W. D. Hunter, 
White, & Godbey, 2006; Ting-Toomey & Kurogi, 1998) merely describe the characteristics (knowledge, skills, and atti¬ 
tudes) of ICC. Co-orientational models (e.g., Fantini, 1995; Kupka, 2008; Rathje, 2007) tend to describe the components or 
process of a successful intercultural interaction. Developmental models describe ICC in terms of individual development 
over time (e.g., Bennett, 1986; P. M. King & Baxter Magolda, 2005). Adaptational models (e.g., J. W. Berry, Kim, Power, 
Young, & Bujaki, 1989; Gallois, Franklyn-Stokes, Giles, & Coupland, 1988) combine the developmental components of 
the aforementioned models and present them in an interactional context of adapting to a foreign culture. Finally, causal 
path models (e.g., Arasaratnam, 2008; Deardorff, 2006; D. A. Griffith & Harvey, 2000; Hammer, Wiseman, Rasmussen, 
& Bruschke, 1998) attempt to integrate the characteristics of compositional models and situate them in an interaction in 
which variables influence each other to predict ICC. 

A recent review of ICC focusing on research across multiple contexts (Leung, Ang, & Tan, 2014) presented another 
system of grouping ICC models. This system differentiates between models that include intercultural traits, intercultural 
attitudes and worldviews, and intercultural capabilities, or some mix thereof. The term intercultural traits refers to sta¬ 
ble personality traits that drive likely behavior, and they commonly include openness to experience and tolerance for 
ambiguity. The term intercultural attitudes and worldviews refers to constructs involving the perception and evaluation of 
information from outside an individual’s own culture. Lastly, the term intercultural capabilities refers to anything that a 
person can do, think, or know that will allow him or her to interact successfully in an intercultural situation. 

Neither scholars in the field of ICC nor higher education administrators have reached a consensus regarding the defi¬ 
nition of ICC and its underlying dimensions. For example, in a recent study, administrators from 24 U.S. postsecondary 
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a Denotes inclusion in Deardorff (2004), in which HEI administrators rated the definitions. b Denotes models that are language-focused. 
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institutions rated nine definitions of ICC on a 4-point scale (4 = highly applicable and 1 = not applicable; Deardorff, 2006). 
The results demonstrated that Byram’s (1997) definition of ICC, which focuses heavily on language proficiency, was the 
highest rated (M = 3.5), followed by Lambert’s (1994) definition (M = 3.3), which highlights task accomplishment in the 
global context (see Table 1; Deardorff, 2006). Responses from administrators also revealed that similar yet distinctive 
terms were being used to discuss this construct, including cross-cultural competence, global competence, intercultural com¬ 
petence, and global citizenship (Deardorff, 2006, p. 247), and confirmed the need for a general definition that could be used 
across student populations and contexts. 

In an effort to find a widely agreed-upon definition, the same researchers identified three prevalent themes across defi¬ 
nitions generated by individual institutions, including “the awareness, valuing, and understanding of cultural differences; 
experiencing other cultures; and self-awareness of one’s own culture” (Deardorff, 2006, p. 247). In the same study, a group 
of 23 international scholars rated the same nine definitions; on average, Deardorff’s (2004) definition of ICC as “the ability 
to communicate effectively and appropriately in intercultural situations based on one’s intercultural knowledge, skills, and 
attitudes” (p. 194) was the highest rated. In addition, the scholars generated definitions and specific elements of ICC. Seven 
definitions and 22 elements were agreed upon by 80% (16 out of 23) of the group, with only one element, understanding 
of others’ world views, receiving 100% agreement from the raters. Although this particular study may have achieved some 
clarity and alignment on defining ICC in the higher education context, further agreement remains elusive, in part due 
to the existence of multiple alternative models (e.g., Fantini & Tirmizi, 2006). In addition, abstract, complex phenomena 
are often better defined through the process of measurement; however, many of the existing theories and models of ICC 
are not clarified through validated measurement. Therefore, the framework presented in this paper incorporates both 
theoretical and measurement considerations. 

Discrepancies in Dimensional Models of Intercultural Competence 

This variability in content of ICC models and dimensions presents several challenges. First, it reduces the conceptual clar¬ 
ity of the construct itself, as some models include as core components factors that are excluded or treated as antecedents 
in other models. For example, tolerance for ambiguity, which refers to the ability to make progress despite high levels of 
uncertainty (Bird, Mendenhall, Stevens, & Oddou, 2010), is included in some definitions and measures (e.g., Deardorff, 
2006; Gudykunst, 2003) but excluded in others (e.g., Byram, 1997). Second, in addition to reducing the conceptual clarity 
of ICC, these discrepancies complicate the specification of ICC’s nomological network (i.e., the constructs theorized or 
empirically related to ICC). Specifically, existing literature has yet to distinguish constructs belonging in the ICC frame¬ 
work from its correlates. Constructs such as global mindedness, broadmindedness, cosmopolitanism, and global identity 
provide prime examples. Because the definitions of these constructs are imprecise and vary considerably, it can be chal¬ 
lenging to determine which of these constructs reflect a subfacet of ICC and which constitute a part of its nomological 
network. Third, several constructs demonstrate significant overlap with ICC—including the global leadership construct 
that has recently received much attention (Birdet al., 2010). The existing literature has yet to fully delineate where one ends 
and another begins (Biicker & Poutsma, 2010). In sum, establishing construct validity for ICC is a less straightforward task 
than it is for other, less complex concepts. Any new model of ICC attempting to address these concerns should meet the 
following criteria: (a) provide specific definitions of the overall construct and its subdimensions, (b) include both cogni¬ 
tive and noncognitive components, and (c) clarify the relationship between subdimensions. To date, many of the models 
of ICC do not meet the above criteria. Although many models are multidimensional in nature, models focusing only on 
attitudes (or attitudes and cognitions) are prevalent, thereby lacking the focus on the behavioral or performance-relevant 
component of ICC. Other scales rely on weak definitions or do not clarify the relationship among subdimensions. 

Malleability of Intercultural Competence in the Higher Education Context 

Some evidence suggests that ICC is a malleable skill and that higher education experiences influence the development of 
these competencies for both educators and students (e.g., Eisenberg et al., 2013). Most intercultural education research 
focuses on best practices to train K-12 teachers to work effectively with diverse student populations (Dejaeghere & 
Cao, 2009; Dejaeghere & Zhang, 2008; Teras & Lasonen, 2013). Similarly, the research on ICC in higher education 
focuses on training international education professionals, which include roles such as collegiate language instructors, 
study abroad and international student advisors, faculty members, and other professionals supporting international 
educational exchange programs (Paige & Goode, 2009, p. 333). 
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A small body of research focuses on student development (e.g., Conway, 2008; Dejaeghere & Zhang, 2008; Fischer, 2011; 
Hao, 2012; Jauregui, 2013; Kahr-Gottlieb & Papst, 2013; Kaufmann, Englezou, & Garcia-Gallego, 2014; Zhang, 2012). 
These studies indicate that ICC may be improved with training, including study abroad programs (e.g., Engle & Crowne, 
2014) and intercultural business courses (e.g., Eisenberg et al., 2013; Rosenblatt, Worthley, & MacNab, 2013). Despite the 
prevalence of the training and activities surrounding this area, the empirical evidence documenting their effectiveness is 
nascent, precluding strong conclusions on the best ways to improve ICC. However, initial evidence suggests that ICC is a 
malleable construct and that higher education may improve students’ ICC (e.g., Williams, 2005). 

Existing Assessments of Intercultural Competence 
Multidimensional Nature of Intercultural Competence Assessments 

Corresponding to the wide-ranging models and conceptualizations of ICC reviewed in the previous section, existing 
assessments of ICC vary in the number of constituent constructs and dimensions to be measured. Some scholars opera¬ 
tionalize ICC as unidimensional and measure it with all items loading onto one factor (e.g.. Global Perspective Survey; 
Hanvey, 1982), although others argue that ICC is multidimensional, including dimensions such as approachableness, 
intercultural receptivity, positive orientation, forthrightness, social openness, enterprise, respectfulness, flexibility, perse¬ 
verance, cultural perspectivism, venturesome, and social confidence (e.g., Intercultural Competency Scale; Elmer, 1987). 
Table 2 presents existing assessments used to measure ICC in higher education and business contexts, including those 
reviewed by Fantini (2009) but excluding those that measure language ability. 

The ICC instruments reviewed in this study vary substantially in terms of how they define the ICC dimensions. Some 
assessments conceptualize ICC as having separate, broad dimensions such as cognitive, interpersonal, intrapersonal, 
metacognitive, affective, motivational, and behavioral, but others use terms such as knowledge, skills, attitudes, processes, 
and awareness. Despite their differences in categorization, ICC instruments have overlapping dimensions. For example, 
the dimensions of openness, flexibility, and empathy appear in multiple assessments. Additionally, several models nest 
specific competencies and traits within subdimensions (e.g., the cultural intelligence construct divides its competencies 
into metacognitive, cognitive, behavioral, and motivational domains; Earley & Ang, 2003). 

Assessment Formats 

Currently, two predominant assessment formats are used to measure ICC: surveys and portfolio assessments. All of the 
instruments reviewed in Table 2 are administered as surveys ranging in length from nine items (i.e., Global Perspec¬ 
tive Survey; Hanvey, 1982) to over 160 items (i.e., Intercultural Communication and Collaboration Appraisal; Messner & 
Schafer, 2012). Typically, these surveys are delivered through an online format, though some assessments (e.g., Intercul¬ 
tural Development Inventory; Hammer, Bennett, & Wiseman, 2003) are also offered in a paper and pencil format. This 
article reviewed only ICC assessments that exclusively used selected-response items. 

In addition to surveys, portfolios that include constructed-response items are also used to assess ICC in higher educa¬ 
tion. A portfolio assessment is a collection of materials produced either by an individual over time or scores from various 
assessments or both. Currently, no standard portfolio assessment exists, meaning that the content, platform (paper vs. dig¬ 
ital), and scoring method vary across institutions, studies (e.g., Ingulsrud, Kai, Kadowaki, Kurobane, & Shiobara, 2002; 
Jacobson, Sleicher, & Maureen, 1999), and contexts (e.g., foreign language courses, study abroad experiences, general 
education). This deficit can be viewed as an advantage. Portfolios are able to capture context-specific skills (e.g., writing 
business letters for a local business owner in a third-world country) and the development of those skills over time. Thus, 
ICC is captured through the collection of work products from different time points in a student’s career (e.g., before, 
during, and after an experience abroad; Ingulsrud et al., 2002; Jacobson et al., 1999). 

Some higher education institutions worldwide use digital portfolios. For example, Alliant International University 
uses a digital portfolio format to assess ICC in its study abroad students. Clemson University also uses a digital portfolio 
and requires all students to provide evidence of cross-cultural awareness as a universal general education requirement, 
regardless of participation in programs abroad. Evidence of cross-cultural awareness, which Clemson University (2016) 
defines as “the ability to critically compare and contrast world cultures in historical and/or contemporary contexts” (bullet 
2), is demonstrated in digital portfolios through the inclusion of writing samples. Although digital portfolios have the 
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capability to include other work products such as audio and video recordings of intercultural communication (Deardorff, 
2009), institutions that actually request such products have not been identified. 

As with all assessments, their format largely depends on the intended purpose of the assessment. Although ICC experts 
suggest that more than one methodology (i.e., both qualitative and quantitative methods) should be used to measure ICC 
(Deardorff, 2006; Fantini, 2009), assessing ICC for higher education institutions to provide benchmark information about 
students’ ICC requires a format that allows meaningful comparisons of individuals and groups of examinees. For this 
purpose, portfolios may not be a feasible assessment format, as it is challenging to standardize the various work products 
submitted by students and to ensure interrater reliability in scoring student work. A survey, however, can be standardized 
and norm referenced to allow higher education institutions to make inferences about the ICC of both an individual and 
a group. Moreover, surveys can include multiple types of selected-response item formats that may better capture the 
multidimensional nature of ICC. For example, Likert-scale responses may be adequate to capture attitudinal components 
of ICC, but forced-choice or multiple-choice questions maybe more appropriate to assess the knowledge and skills that 
characterize ICC. In the following section, we discuss the possible item types and their strengths and weaknesses within 
the category of selected-response items. 

Intercultural Competence Selected-Response Item Types 
Likert-Scale Items 

Most ICC assessments reviewed in this study attempt to capture components of ICC using self-report Likert items. Likert- 
scale items typically ask the respondents to rate their agreement with a given statement on a scale that ranges from one 
extreme to another (e.g., strongly agree to strongly disagree). Some assessments use anchors that directly ask respondents to 
assess themselves on a particular skill. For example, a behavioral regulation item may ask respondents to indicate whether 
they would change their behavior in accordance with cultural customs. Another variation across ICC assessments with 
Likert-scale items is the number of response categories or points on the response scale. Most assessments use a 5-point 
Likert scale, although others range from a 4-point to a 7-point scale. 

Although most of the Likert-type items are self-report, one assessment included in our review used Likert-type 
responses for peer assessments. The Behavioral Assessment Scale for Intercultural Communication (BASIC; Koester & 
Olebe, 1989) uses a 4-point Likert scale in a peer rating of intercultural communication effectiveness. This instrument 
was adapted from Ruben’s (1976) behavioral assessment of communication competency for intercultural adaptation. 
(See Chen, 1992, for a review.) The instrument was designed to fit the context of intercultural roommates in a university 
setting in which one roommate is native to the United States and the other is an international student. Roommates rate 
each other on eight items measuring the following aspects of ICC: display of respect, interaction posture, orientation to 
knowledge, empathy, task-related roles, relational roles, interaction management, and tolerance for ambiguity. Unlike the 
other ICC assessments, each one-item scale presents the roommate with a behavioral description of the person that they 
are rating for each of the four points on the Likert scale. The BASIC is the only ICC assessment identified that includes 
this use of descriptions for Likert-scale anchors (similar to anchored vignettes; G. King, Murray, Salomon, & Tandon, 
2004), as the majority of assessments use more traditional Likert-scale response categories (i.e., strongly agree to strongly 
disagree). 

Multiple-Choice Items 

To directly measure the knowledge components of ICC (i.e., language and cultural knowledge), multiple-choice items are 
typically used, such as in the Global Awareness Profile (GAP; Corbitt, 1998) and the Global Competence Aptitude Assess¬ 
ment (W. D. Hunter et al., 2006). These assessments differ in that some multiple-choice items assess cultural knowledge 
that is general or global and others assess knowledge that is specific to one culture. An example of a global culture item 
would be something akin to “What is the most popular sport in the world?” As one can see, such an item does not ask 
about one particular culture, but rather references the general world population. 

In addition to culture-general knowledge, the GAP uses multiple-choice items to assess knowledge of the environment, 
politics, geography, religion, and socioeconomics of six regions (Asia, Africa, North America, South America, the Mid¬ 
dle East, and Europe) around the world. In contrast, the Global Competence Aptitude Assessment (Global Leadership 
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Figure 1 Screenshot of the Implicit Association Test, a test of hidden bias. Retrieved from UnderstandingPrejudice.org (http:// 
understandingprejudice.org). Copyright ©2002-2016 by S. Pious. Reprinted with permission. 

Excellence, 2010) uses multiple-choice items based on specific cultures, without any culture-general items. An example 
of a culture-specific item is, “When greeting a colleague from Chile, one must ...” Based on the norms of the culture and 
context of the situation described, the examinee selects the most appropriate response from a list of choices. 

Implicit Association Tests and Q-Sort Methodology 

Less common item formats that have been employed to assess the attitudinal component of ICC include implicit asso¬ 
ciation tests (IATs) and the Q-sort methodology. LATs typically capture how strongly a test taker relates two mental 
representations, or concepts, by measuring the response time (latency) for making the correct association (Greenwald, 
Poehlman, Uhlmann, & Banaji, 2009). This assumes that the faster a test taker matches an object to a concept, the stronger 
the relationship is that the test taker perceives between those concepts. One IAT, the Tests of Hidden Bias, assesses nega¬ 
tive prejudices toward various ethnic groups by presenting examinees with a photo of a White/Caucasian face next to an 
African American face on a computer screen and requiring the participant to quickly select the “good” or “bad” photo. 
Figure 1 presents a screenshot of the free test online. Because in this case there is no correct association, per se, the authors 
state that “faster responses for the {Black+positive|White+negative} task than for the {White+positive|Black+negative} 
task indicate a stronger association of Black than of White with positive valence” (Greenwald et al., 2009, p. 18). Such IATs 
have been criticized as being too specific to the context of the United States, a country in which race has historically been 
conceptualized as ethnically dichotomous (i.e., Black vs. White). In response, other IATs have been developed specific to 
other cultures (e.g., a Romanian IAT; Bazgan & Norel, 2013). 

Q-sort is another method that has been used in ICC assessments. The Q-sort methodology has been used in many areas 
of psychology and involves rank ordering of subjective concepts. The Intercultural Communication and Collaboration 
Appraisal tool (ICCA) developed by Messner and Schafer (2012) uses the Q-sort methodology when it requires examinees 
to sort cards (or concepts, if administered online) in response to a given prompt. The ICCA includes two Q-sorts. The 
first sort consists of the examinee sorting 48 attitudes, behaviors, and beliefs in order from most descriptive of self to least 
descriptive. The second sort involves the examinee selecting the most important six intercultural competencies from a set 
of 12 competencies and ranking them in order of importance. 

Situational Judgment Tests 

Another method of assessing ICC is the situational judgment test (SJT). SJTs aim to measure an ability or competency 
based on the participants choice of response to a hypothetical situation. After reading a few sentences representative 
of a real-world situation, participants then select the appropriate response option of the presented set or respond to an 
open-ended prompt. Most of the SJT prompts focus on behavioral and knowledge components. Prompts such as “What 
would you do?” require the participant to indicate the behavior they would most likely engage in from a series of poten¬ 
tial actions (Whetzel & McDaniel, 2009). The options are often scored on a scale of most effective, neutral, and ineffective 
behavior to produce a composite score for the SJT. Knowledge prompts such as “What is the best answer?” require the 
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participant to choose the correct answer in the given situation. Sometimes participants are required to rank the responses 
in order of most effective to least effective (Whetzel & McDaniel, 2009). According to a recent meta-analysis, SJTs demon¬ 
strate substantial criterion, content, and face validity (Whetzel & McDaniel, 2009). For example, McDaniel, Morgeson, 
Finnegan, Campion, and Braverman’s (2001) meta-analysis generated an adjusted correlation of .34 between SJTs and job 
performance, supporting criterion-related validity of SJTs. 

However, due to the multidimensional nature of many SJT items, they typically have low internal consistency as indi¬ 
cated by Cronbach’s alpha. Given this reason, experts recommend the use of parallel forms or test-retest reliability when 
examining the reliability of SJT items instead of using Cronbach’s alpha (Whetzel & McDaniel, 2009). The “correct” 
response option can also be contested, as it is often determined by consensus, which may potentially bias the test. For 
cross-cultural SJTs, this method may be open to bias if test developers are not conscious of their cultural assumptions. 
Applicants typically express positivity toward this type of test (Lievens, Peeters, & Schollaert, 2008). Moreover, this test 
type, by assessing intentions, captures more direct indicators of behavior than attitudinal measures and is well suited 
to measure skills. Regardless, scores on these items are still not immune to inflation by practice effects and participant 
deception. 

Only a few examples of SJTs exist relevant to ICC context, although the critical incident format used in SJT items is 
found in cultural assimilators such as cross-cultural training courses in which participants are presented with cultural 
scenarios and alternative behavioral options they then discuss (Bhawuk, 2001; Earley & Peterson, 2004). The Cultural 
Intelligence Assessment (Thomas et al., 2015) asks test takers to choose among a set of behaviors to indicate which one 
they believe to be the most correct choice for a given scenario. Participants are asked to complete 14 questions designed 
to measure cultural knowledge, skills, and metacognition. Another SJT, designed to measure cross-cultural social intel¬ 
ligence (CCSI; Ascalon, Schleicher, & Born, 2008), asks participants to rate the likelihood that they would perform each 
of four behavioral options in response to a series of cross-cultural scenarios. The four options fall into specific categories 
(nonempathetic, nonethnocentric; nonempathetic, ethnocentric; empathetic, nonethnocentric; and empathetic, ethno¬ 
centric), allowing for the creation of two subscales: empathy (a = .61) and ethnocentrism (a = .71). Coefficient alpha for 
the overall scale was a = .68 (Ascalon et al., 2008). 

The CCSI is an example of an SJT measure relevant to ICC that demonstrates evidence of relationships with conceptu¬ 
ally related constructs such as cognitive ability (e.g., GMAT; r = .30) and personality constructs (Ascalon et al., 2008). The 
GMAT has been shown to have adequate reliability (a = .92 for the test as a whole). Specifically, the relationship between 
the CCSI scores and three of Goldberg’s (1999) International Personality Item Pool (IPIP) subdimensions (conscien¬ 
tiousness, emotional stability, and openness to experience) averaged r = .30. The IPIP also demonstrates adequate overall 
internal reliability (a = .80). The CCSI itself has somewhat low reliability (a = .68 for the overall, a = .61 for the empathy 
subscale, and a = .71 for the ethnocentrism subscale), but these coefficients are roughly similar to other SJT studies (Chan 
& Schmitt, 1997). Combined, the evidence of internal consistency and convergent validity was taken as a strong indicator 
of the initial validity of both the measure and the use of SJTs to assess ICC. To the extent of our knowledge, however, no 
SJT specific to ICC presents evidence of criterion validity (Ascalon et al., 2008). 

Simulation-Based Measurement 

Although commonly used as training tools for the development of ICC, simulations have also been used to assess ICC 
(e.g., Harrison, 1992; Jarrell, Alpers, Brown, & Wotring, 2008). Simulations involve role-playing activities in which partic¬ 
ipants engage in a limited intercultural scenario. The simulation may require the participant to interact with a confederate 
(a paid assistant who has been instructed to act in a particular way) or an avatar (a figure representing a person or a 
computer-simulated character) who may be enacting his or her own cultural norms, the cultural norms of a different 
group, or fictitious norms. Depending on the simulation, other participants in the simulation may play this role instead 
of confederates. Perhaps the most well-known and commonly conducted intercultural simulation is the BaFa’ BaFa’ sim¬ 
ulation (Shirts, 1977). This simulation requires students to pretend to be in two fictional cultures and interact with each 
other in order to attempt to collect a certain number of cards, the exact nature of which depends on their culture. The two 
cultures are loosely designed to polarize individual-collectivism differences (preference for group vs. individual) with 
verbal and nonverbal differences included (i.e., preference for volume and personal space). Aside from accomplishment 
of the game goals, observers could also gather interaction data to assess the behavioral component of ICC. This measure 
would have to be validated, however, as the current simulation kit does not include a behavioral checklist. 
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A more psychometrically sound example is a simulation by Harrison (1992). This simulation involved participants 
interacting with a confederate pretending to manage a Japanese employee. The interaction was then independently rated 
by two judges in terms of maintaining harmony, soliciting employee input, demonstrating personal concern, improving 
consensus, and reducing conflict (Bhawuk & Brislin, 2000). Another well-known cultural simulator is the Robin Sage 
Exercise (Skinner, 2002), which serves as the culminating training activity for the Army Special Forces Qualification 
Course. This 2-week training exercise and assessment involves an intensive military simulation in the fictional country of 
Pineland, encamping over 8,000 miles of North Carolina and using thousands of volunteers (Parkins & Williams, 2011). 
Although this exercise has been restricted to the military context, it does expressly assess ICC and therefore demonstrates 
the use of simulation for ICC measurement. 

Validity and Reliability Evidence of Existing Assessments 

According to the Standards for Educational and Psychological Tests (American Educational Research Association [ AERA], 
American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014), every 
assessment should: (a) produce consistent and accurate scores (reliability) and (b) provide sufficient evidence to support 
that it accurately measures what it is intended to measure (validity). In this section, we first discuss reliability evidence 
for the previously developed ICC assessments reviewed in this study. We then discuss the validity evidence regarding the 
internal structure, the relationships with conceptually related constructs, and the relationship with criteria. A summary 
of the reliability and validity evidence is presented in Table 3. 

Test and Scale Reliability 

As previously discussed, the majority of ICC assessments consist exclusively of Likert-type items, and the test and scale 
reliability evidence was generally adequate. Over 90% of the scales provided evidence of adequate reliability, most com¬ 
monly assessed via coefficient alpha (a), a measure of the average intercorrelations among test items. However, for ICC 
assessments with more than one subdomain, several measures with adequate overall alpha values (e.g., Cross-Cultural 
Adaptability Inventory [CCAI]; Davis & Finney, 2006) had subscale scores that dipped below .70, which is the com¬ 
mon cutoff for acceptability (Kline, 2000). Although fewer in number, other scales were able to provide evidence of 
adequate reliability using test-retest (e.g., Inventory of Cross-Cultural Sensitivity; Bazgan & Norel, 2013) and alternate 
forms evidence (e.g., Cross-Cultural Sensitivity Scale; Pruegger & Rogers, 1993). For scale-specific reliability information, 
see Table 3. 

Validity Evidence Regarding Internal Structure 

One important aspect of validity evidence is the internal structure (i.e., dimensionality) of the assessments, which indicates 
whether the association among test items corresponds to one or more intended domains (or dimensions) of the assessment 
(AERA, et al., 2014). One of the most commonly used methods to evaluate the internal structure is confirmatory factor 
analysis (CFA; Rios & Wells, 2014). An acceptable index of model fit indicates that the structure of the assessment is as 
intended, based on the relationship between the test items and the construct(s). 

Among all the ICC assessments in Table 3, more than 10 assessments reported a single overall score to test takers, and 
five of them provided evidence to support the unidimensional structure of the assessment. Graf and Mertesacker (2009) 
fitted a one-factor model to data from the Nonverbal Communication Competence Scale, and the results suggested that 
all items were measuring the same construct. Arasaratnam (2009) and Olebe and Koester (1989) also provided similar 
evidence for the Intercultural Communication Competence test and the BASIC test, respectively. 

For assessments that report subscale scores, about half provided evidence to support the multidimensional structure 
of the assessment. For example, the CFA results from Wang et al. (2003) suggested the four subscales of the Scale of 
Ethnocultural Empathy were adequately measuring the intended constructs, and the four factors shared approximately 
81% of the total variance. Hammer et al. (2003) also reported a good model fit of a five-factor model for the Intercul¬ 
tural Development Inventory. However, a multidimensional structure of assessments is not always supported by the data. 
For instance, Davis and Finney (2006) found weak support for the four-factor model originally proposed for the CCAI. 
Nguyen, Biderman, and McNary (2010) also found each item from the CCAI loaded on a general factor (i.e., cross-cultural 
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Test Reliability Validity 

Multicultural Personality Subscale alpha = .68 - .87. Internal structure: Four factors with eigenvalues greater than 4 emerged. 

Questionnaire (MPQ) Relationship with other assessments: Correlations with Big Five and Need for 

Change were significant at p < 0.05 except flexibility with agreeableness and 
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(ICCs), based on 122 reports that contained 184 independent samples, was 
rICC = .274. For socially sensitive topics, the predictive validity of self-report 
measures was remarkably low and the incremental validity of IAT measures 
was relatively high. 





Test Reliability Validity 

Global Competence Aptitude No reliability information available. Surveyed international educators as well as human resource professionals at 

Assessment multinational corporations to identify critical elements of global competence. 
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trace, = .73 for relational skills, = .69 for 
perceptual acuity, = .66 for 
empathy, = .70 for adaptability, = .56 for 
tolerance of uncertainty. 
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adaptability) and one of the nine group factors (e.g., emotional resilience, flexibility/openness, personal autonomy, and 
the like). These group factors represented the constructs that were not accounted for by the general factor. Therefore, 
even though the CCAI reported four subscale scores, the results from the two studies did not support a four-dimensional 
structure of the assessment. In sum, evidence supporting the multidimensional structure for existing ICC measures is not 
as strong as desired. 

Further, about half of the ICC assessments reviewed in this paper did not report evidence of adequate internal structure. 
Best practices for scale construction support providing this evidence by demonstrating good model fit of an item-level 
factor analysis. Best practices for scale construction suggest that this evidence is ideally provided by demonstrating good 
model fit of an item-level factor analysis. For example, the Global Competencies Inventory (GCI; Bird, Stevens, Menden¬ 
hall, & Oddou, 2002) reported only the correlation among the three subscores instead of the measure’s internal structure. 
The lack of evidence describing the structure of the scale demonstrated a significant gap in validity evidence and thus a 
particularly notable weakness. 


Validity Evidence Regarding Relationships With Conceptually Related Constructs 

The second aspect of validity evidence is the relationship with conceptually related constructs, traditionally known as 
convergent and discriminant validity. A correlation coefficient between two assessments is typically used to estimate the 
degree to which the constructs measured by the two assessments are related to each other. According to Standards (AERA, 
et al., 2014), a valid assessment would show correspondence with relevant constructs and discrimination with irrelevant 
constructs. Because the correlation coefficient is affected by the reliability of the two assessments (i.e., low reliability would 
lower the correlation coefficient below the level it would have reached when the reliability is high), it is important to 
report the reliability information along with the correlation coefficient. Overall, about half of the existing ICC assessments 
reviewed in this study provided some evidence concerning a relationship with related constructs. 

Research with the popular cultural intelligence construct has fairly ample evidence, primarily from organizational 
samples (Leung et al., 2014), but also in educational contexts. For example, Erez and colleagues (Erez et al., 2013; Lisak 
& Erez, 2015) conducted two studies using the Cultural Intelligence Scale (Ang, Van Dyne, & Koh, 2006; Ang et al., 
2007) with students participating in cross-cultural virtual team projects. The results demonstrated a strong relationship 
(r = .50) between the cultural intelligence of students in global virtual teams and a sense of belonging to global context, 
termed global identities (Erez & Gati, 2004). The researchers measured global identities with a validated and adequately 
reliable Global Identity Scale (a = .85; Erez & Gati, 2004; Shokef & Erez, 2006,2008). One of the studies further connected 
cultural intelligence to openness to cultural diversity (r = .16) and leadership emergence (r = .56; Lisak & Erez, 2015). 
Providing some evidence of an antecedent in the nomological network of ICC, other research with this scale connected 
it to expectancy disconfirmation after cooperative intercultural contact (Rosenblatt et al., 2013). 

In a study by Hammer et al. (2003), the authors confirmed the theoretically postulated relationships among the sub¬ 
scales of the Intercultural Development Inventory (IDI; a = .80 - .85) and two related assessments — the Worldmindedness 
Scale (a = .67) and the Intercultural Anxiety Scale (a = .86). Higher scores on the denial/defense subscale of the IDI were 
related to lower scores on the Worldmindedness Scale (r = —.29) and higher scores on the Intercultural Anxiety Scale 
(r= .16). 

Structural equation modeling, which models error terms in order to isolate the latent construct, constitutes another, 
more robust, method of supporting relationships among measures. Instead of calculating the correlation coefficient from 
observed scores, Nguyen et al. (2010) used a structural equation modeling technique to examine the relationship between 
the CCAI and Goldberg’s IPIP Big Five questionnaire (Goldberg, 1999). The results showed weak to moderate correlations 
between the two assessments (r = .18-.55), which suggests that test takers with better cross-cultural adaptability tend to be 
more extroverted, agreeable, conscientious, emotionally stable, and open to new experiences. The correlation coefficient 
estimated from the structural equation model is the correlation between the underlying constructs of two assessments. 
Unlike the statistics employed in the Hammer et al. (2003) study, measurement error does not affect the structural equation 
model correlations. Therefore, structural equation modeling is a promising method for future research to provide validity 
information regarding relationships with conceptually related constructs. 
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Validity Evidence Regarding Relationship With Criteria 

The relationship between the assessment and related criterion measures is another important aspect of validity evidence 
(AERA et al., 2014). Examples of the criteria used for existing ICC assessments include self-evaluation, peer impressions, 
job performance, and the like. Few of the assessments in Table 3 provide this type of validity evidence, perhaps due to the 
resource-heavy requirements of criterion data collection. 

Nguyen et al. (2010) examined whether the subscale scores of the CCAI would predict the number of international job 
assignments when controlling for the variance of the general factor (cross-cultural adaptability). The results partially 
supported the hypothesis, as only two subscales (resilience and personal autonomy) were weakly correlated with the 
logarithm number of international job assignments (r = .20 and r = .29, respectively), and no subscales were correlated 
with the actual number of assignments. In a study by Matsumoto et al. (2001), the participants who took the Intercultural 
Adjustment Potential Scale (ICAPS) also rated themselves and all other members of the focus group on a two-item rating 
scale about intercultural adjustment. Two interviewers also made both ratings of all participants. The analysis showed the 
composite score of the ICAPS was significantly correlated with self, peer, and interviewer ratings (r = .69, .70, and .66, 
respectively; p < .001), which supported the utility of the ICAPS in predicting intercultural adjustment. In addition, the 
Miville-Guzman Universality-Diversity Scale, which measures awareness and potential acceptance of both similarities 
and differences in others, was not significantly related to the SAT ® verbal scores (Miville et al., 1999), providing evidence 
of discriminant construct validity. However, in a U.K.-based study of students in culturally diverse teams, the Multicultural 
Personality Questionnaire was found to be related to exam grades (Van der Zee, Atsma, & Brodbeck, 2004); in particular, 
the flexibility component was moderately related using hierarchical linear modeling (z = 1.78). 

In a study with 71 recruiters in a U.S. high-tech organization (Hammer, 2011), scores on the IDI were found to be 
correlated (r = .43) with the rating of success in meeting diversity goals for recruitment. In another funded study on study 
abroad students (Hammer, 2005), 1,500 students completing a 10-month homestay program organized by AFS Intercul¬ 
tural Programs, an American-based study abroad facilitator, were compared to a control group (n = 638) of students who 
remained at their home institutions. Students involved in the homestay program resided in Austria, Brazil, Costa Rica, 
Ecuador, Germany, Hong Kong, Italy, Japan, and the United States. Scores on the IDI were found to be positively cor¬ 
related with the number of intercultural friends students reported having, a sociometric measure of experience success 
reflecting the ability of students to build international relational networks (Hammer, 2005). The measure was also found 
to be related to reduced anxiety and increased satisfaction with the experience. 

Other evidence suggested that the Cultural Intelligence Scale (CQS) may relate to several valued student outcomes. In 
particular, higher scores on the CQS were related to commitment to and satisfaction with international educational courses 
(e.g., Morell, Ravlin, Ramsey, & Ward, 2013; Ramsey, Barakat, & Aad, 2014), intention to work abroad (e.g., Remhof, 
Gunkel, & Schlaegel, 2013), and global virtual team leadership (Erez et al., 2013; Lisak & Erez, 2015). These outcomes, 
which fall into the category often labeled previous experience, serve as useful criteria as they have been related to global 
leadership effectiveness (e.g., Caligiuri & Tarique, 2012). Research also suggests that study abroad experiences develop 
student competencies when assessed using this scale (Engle & Crowne, 2014; Varela & Gatlin-Watts, 2013). However, the 
validity evidence relating the scale with adjustment while studying abroad is mixed. One study, with international students 
studying in New Zealand, indicated that the motivational subscale was not predictive of psychological adjustment during 
study abroad (Ward, Wilson, & Fischer, 2011); another study, with a Taiwanese sample, indicated that cultural intelligence 
was not related to adjustment (Lin, Chen, & Song, 2012). It should be noted that the two studies used different scales 
for adjustment—the Sociocultural Adaptation Scale (Ward & Kennedy, 1999) and the Black and Stephens (1989) scale 
measuring work, interactional, and general adjustment. The Black and Stephens scale is commonly used, but has several 
measurement concerns, including proper validation evidence (Thomas & Lazarova, 2006). 

Summary of Reliability and Validity Evidence 

The review of the reliability evidence of existing ICC assessments suggests no major issues with reliability at the total 
test level. All the assessments in Table 3 reported reliability evidence suggesting satisfactory reliability at the test level; 
however, some minor issues still exist. One issue is that the subscale score reliability of five assessments was found to be 
unsatisfactory (a < .70), including the Global Perspectives Inventory, Cultural Intelligence Assessment, and CCAI. As 
subscale scores are usually reported for diagnostic purposes (e.g., when used as a training tool), unreliable subscores may 
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result in inaccurate diagnoses and, therefore, provide misleading information for score users. Unreliable subscales suggest 
that error will contaminate different facets unequally and reduce the quality of a development plan constructed based on 
scores. Further, it would be difficult to validate ICC training interventions when some subscale scores randomly fluctuate. 
Another issue observed is related to the comparability among test forms. Of the three ICC assessments in Table 3 that 
consisted of more than one test form, two reported high correlations between test forms, although one did not provide 
any information. 

Unlike the reliability evidence, the quantity and quality of validity evidence varied significantly among existing ICC 
assessments. Roughly half of the assessments in Table 3 reported validity evidence regarding internal structure, about half 
reported evidence regarding the relationship with related constructs, less than one third reported evidence regarding the 
relationship with related criteria, and only two assessments reported all three aspects of validity evidence. In addition to 
quantity, the quality of some available validity evidence was also unsatisfactory. For instance, the hypothesized internal 
structure of some assessments was not supported by the data, which raises questions about subscale score reporting. The 
relation between some ICC assessments and their related measures were also poorly estimated due to the low reliability 
of the tests. 

In general, stronger validity evidence was available for some assessments developed after 2000 (e.g., the Cultural Intelli¬ 
gence Scale and the IDI) and the assessments developed by organizations (e.g., the CCAI). However, for most assessments 
developed 20 or 30 years ago or developed by independent researchers, relatively insufficient validity evidence exists. This 
lack of validity evidence may be attributable to limitations on resources such as financial support or available statisti¬ 
cal packages, but may also reflect an outdated approach to validity. After Messick (1995) described validity as a single 
construct for which researchers could provide various types of evidence, the importance of gathering a range of validity 
evidence to support test score inferences has been gradually acknowledged by test developers. Although more validity 
research has been conducted in recent years, one aspect of validity that is still often missing is the evidence regarding 
the relationship with criteria. This holdover may explain the prevalence of validity evidence limited to a single type. 
In keeping with Messick, no priority was given to any type of evidence; however, the particular lack of criteria-related 
evidence should be highlighted. Very few measures were related to any sort of accepted criteria. Therefore, future valid¬ 
ity research should be encouraged to gather criteria information to clarify the extent to which the scores from an ICC 
assessment predict test takers’ skills to communicate and work across cultures in authentic situations. Criteria-related 
evidence is particularly convincing in terms of investment — if a strong argument is to be built for higher education to 
invest in the development of these skills, then persuasive evidence of their relations to valued outcomes will be the best 
foundation. 


Challenges in Designing an Intercultural Competence Assessment 
Confounds and Issues With Self-Report Measures 

Self-report measures are a versatile tool suited for capturing attitudes and declarative knowledge (Gabrenya, Griffith, 
Moukarzel, Pomerance, & Reid, 2012). For the assessment of ICC, however, sole reliance on self-report measures presents 
several challenges. First, it may be confounded with student experience levels. The typical young adult will have limited 
exposure to multicultural environments and less experience reflecting upon the skills and behaviors comprised by ICC. 
Thus, items that rely on previous experience may be adversely impacted by the lack of exposure. Other confounds include 
cognitive biases, in particular future-oriented optimism (e.g., Bazerman, 1990), which may further complicate self-report 
as students respond to items based on their most idealistic self. Additionally, self-report items may be inappropriate for 
assessing interaction tendencies and other ICC skill components. 

Moreover,## although the current self-report assessments seem to reliably measure the attitudinal components of ICC, 
faking behaviors may present an additional challenge for self-report measures (Likert-scale responses). The tendency for 
respondents to deliberately provide inaccurate responses or self-descriptions to make themselves appear more attractive, 
interesting, or valuable (faking) is a critical concern in self-report attitudinal measures such as those on ICC assessments. 
As previous research has demonstrated a large impact of faking on test results (d = 0.48 to d = 3.34; Viswesvaran & Ones, 
1999), researchers have attempted to control for it by (a) identifying and making statistical adjustments and (b) developing 
item types that make it more difficult for respondents to fake. 
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Faking 

Self-report respondents can engage in faking behaviors intentionally and unintentionally. For many years, faking behavior 
was conceptualized as socially desirable responding. Seminal work by Paulhus (1984) suggested that social desirability 
comprises two components: self-deceptive enhancement (SDE) and impression management (IM). SDE was considered an 
unconscious form of social desirability that is associated with a positive outlook (Taylor & Brown, 1988). IM, on the other 
hand, is an intentional attempt at deception (Paulhus, 1984). It is likely that this two-factor structure of social desirability 
was implicitly extended to faking behavior because of the literature’s close association of the two phenomena. More recent 
faking research now makes a distinction between unintentional misrepresentation, which is akin to bias, and intentional 
applicant faking behavior (e.g., McFarland & Ryan, 2006; Sackett, 2011). In the case of SDE, the source of bias is a general 
tendency to have positive views of oneself (Taylor & Brown, 1988). Other biases may also contribute to inflated scores 
under motivated conditions. For example, the future orientation cognitive bias influences respondents to respond more 
positively to items in the future than the past (Taylor, 1989). Extreme response styles (e.g., using only the ends of a Likert 
scale) can also distort self-report data (Johnson, Shavitt, & Holbrook, 2011). Even if committed unintentionally, faking 
behavior still represents a minor threat to validity due to the introduction of additional error variance. This error variance 
is not likely to be uniform across all respondents, so the impact of unintentional distortion bias is likely small decrements 
to validity due to the introduction of variance not associated with the target construct. However, practically significant 
drops in validity are not likely. Owing to this shift in the conceptualization of faking behavior and the low severity of the 
psychometric consequences, most attention is now focused on intentional faking (Ziegler, MacCann, & Roberts, 2011). 

Significant differences in responses across motivated and unmotivated conditions have provided evidence for inten¬ 
tional faking behavior. R. L. Griffith, Chmielowski, and Yoshita (2007) investigated within-person differences in faking 
behavior across settings. They asked participants to complete a measure of conscientiousness as part of an actual employ¬ 
ment application process. Afterward, the researchers contacted the participants and instructed them to complete the 
same measure as honestly as possible with the reassurance that the second version was for research purposes only. The 
researchers found a significant difference between responses across the two conditions: Significant within-person dif¬ 
ferences existed between mean level scores in the applicant condition and mean level scores in the honest condition, 
F( 2, 59) = 42.32, p < 0.001, suggesting that people can and do intentionally alter their responses in an effort to portray 
themselves in a more positive light when motivated to do so (R. L. Griffith & Peterson, 2008). This finding suggests that, 
depending on the environment, test takers are not always honest or accurate or both on self-report tests. The pattern of 
within-subject score inflation has been replicated when data was collected in the same fashion (e.g. Arthur, Glaze, Villado, 
& Taylor, 2010; Peterson, Griffith, Isaacson, O’Connell, & Mangos, 2011). R. L. Griffith and Converse (2011) synthesized 
the empirical literature via statistical analyses, simulations, and logical deduction and estimated that, on average, 30% of 
applicants (±10%) engage in faking behavior. The impact of faking behavior is substantial, with decrements on internal 
(Chaney & Christiansen, 2004) and external validity metrics (e.g., Komar, Brown, Komar, & Robie, 2008; Peterson et al., 
2011). Some of the decrement to validity may be artifactual as a result of nonlinearity in the data (Peterson & Griffith, 
2006). Applicants who increase their scores, but perform at a level predicted by their true score, provide data points that 
function as outliers. Essentially, the faker’s data points are shifted toward the higher end of the personality score distri¬ 
bution, but their performance is not commensurate with this positive shift in scores. This deviation from the monotonic 
relationship between personality and performance results in a nonlinear artifact that attenuates the correlation between 
the personality measure and the outcomes of interest (Peterson & Griffith, 2006). Other contributing factors to the attenu¬ 
ation of predictor criterion relationships maybe more substantive in nature. Some research has demonstrated a significant 
relationship with applicant faking and counterproductive behaviors in the workplace (Peterson et al., 2011). 

Administering External Items 

One approach to controlling for faking consists of administering external items that are unrelated to the construct of 
interest (e.g., ICC) and do not count toward the examinee’s score. Currently, there are two types of external items: (a) 
bogus and (b) social desirability items. Bogus external items are ones that appear to be related to the construct (e.g., 
ICC), trait, skill, or task of interest, but the objects or scenarios described in the items do not actually exist (e.g., “How 
often do you utilize murray-web system to locate unpublished research articles?”; where the murray-web system does not 
exist; Dwight & Donovan, 2003, p. 10). In contrast, social desirability items measure the tendency to answer questions 
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Directions: Out of the three statements, select one that describes you MOST 
accurately and one that describes you LEAST accurately. 



MOST like me 

LEAST like me 

1 am relaxed most of the time 

1 start conversations 

1 catch on to things quickly 




Figure 2 A forced-choice item asks the respondent to choose from one of two or more options that appear equally desirable. 

in a manner that is perceived to be viewed favorably by others. Consistent endorsement of either item type may suggest 
that respondents are providing unauthentic or faked responses. Even though social desirability items are often used as 
proxies for faking behavior, research has suggested that they are ineffective at identifying and controlling for faking (R. 
L. Griffith & Peterson, 2008). This research analyzed the validity of social desirability as a proxy for within-subject score 
change across motivated and unmotivated conditions. Using the proxy variable estimation suggested by J. E. Hunter and 
Schmidt (2004), R. L. Griffith and Peterson (2008) reported that the operational quality of a measure of social desirability 
as a proxy for faking was poor (interpreted similarly to a corrected correlation coefficient, between .08 and .11). J. E. 
Hunter and Schmidt proposed that the quality of a proxy variable could be determined by multiplying the reliability of 
the proxy measure by the correlation of the proxy measure and the variable of interest. Measures of social desirability are 
often self-report and demonstrate adequate reliability; however, the correlations between measures of social desirability 
and within-subject score change are quite low and, in some instances, negative (R. L. Griffith, Malm, English, Yoshita, 
& Gujar, 2006). Thus, the low proxy index reported by R. L. Griffith and Peterson was influenced more by the lack of 
common variance of measures of social desirability than it was by error variance. In general, social desirability items are 
no longer viewed as a useful tool to assess and correct for faking behavior. 

When using external items, two approaches are available to control for the impact of faking on test scores: (a) deletion 
of the data from respondents deemed to be faking and (b) statistical adjustments. The first approach is the older of the two 
and consists of setting an a priori threshold for the number or percentage of bogus or social desirability items endorsed. 
If examinees exceed this a priori threshold, they are deemed to be faking, and their data on the assessment of interest 
is completely deleted. The second approach is to compute corrected scores for respondents who provide unauthentic 
responses by regressing social desirability scores onto trait scores (e.g., ICC) to compute a residual score. This approach 
attempts to parcel out variance associated with social desirability from the construct of interest (ICC); however, research 
has shown that this partialing may remove meaningful variance, which leads to a decrease in the validity of the measure 
(e.g., Soubelet & Salthouse, 2011). 

Employing Alternative Item Types 

As the use of external items merely attempts to identify faking behavior, researchers have attempted to apply alternative 
item types (i.e., non-Likert items) to make it more difficult for examinees to fake. Such an approach does not purport to 
completely eliminate faking and still involves the use of self-report, but it does aim to reduce it. For this purpose, two item 
types have been proposed: (a) SJT and (b) forced-choice items. As described previously, SJTs present a respondent with 
a task-related situation, which can be in written, video-based, or multimedia format, and they ask the respondent how 
she or he would theoretically respond (i.e., not based on actual behavior) by choosing from a list of options (Whetzel & 
McDaniel, 2009). 

In contrast, forced-choice items ask the respondent to choose from one of two or more options that appear equally 
desirable (Christiansen, Burns, & Montgomery, 2005). As an example, Brown and Maydeu-Olivares (2011) developed a 
forced-choice triad item for a Big Five personality inventory (see Figure 2). 

Although both SJTs and forced-choice items have been proposed as item types that can reduce faking, more research 
has been conducted on the latter item type. Specifically, when comparing Likert and forced-choice items, the latter have 
been shown to significantly reduce the impact of faking on mean scores by as much as 0.68 standard deviations (Jackson, 
Wroblewski, & Ashton, 2000; Martin, Bowen, & Hunt, 2002). However, forced-choice items provide two limitations when 
compared to Likert items: (a) They require an increased number of items and (b) there are a number of psychometric con¬ 
cerns related to scoring. Regrettably, very little research has investigated whether using forced-choice items is worthwhile 
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in low-stakes testing contexts, as there is uncertainty regarding the impact of faking in such a context. Assuming that fak¬ 
ing is an issue on the ICC assessment, the best approach may be to use multiple item types, particularly as forced-choice 
items will require increased test length. 


Culture-Specific Versus Culture-General Knowledge 

A known challenge to assessing the knowledge and skills associated with ICC is that they can be context dependent. For 
example, cultural knowledge is often situated within a specific culture and may require specific language skills. However, 
assessing ICC with items referencing a specific culture may be unfeasible: An individual may come into contact with a 
number of different cultures within his or her lifetime. As a result, it maybe preferable to assess culture-general knowledge 
or knowledge that is useful in interpreting, coping with, and adapting to cross-cultural interactions. That is, instead of 
assessing how knowledgeable an individual is about the cultural norms and practices of a particular country or region, the 
more desirable approach may be to assess an individual’s recognition that a new situation may be influenced by cultural 
differences. This recognition is largely developed through a cultural schema, which is a mental structure, framework, 
or system that is used to understand how personal background, values, and beliefs impact cross-cultural interactions 
(Brenneman et al., 2016). This culture-general position has also gained ground in the cross-cultural training literature 
(e.g., Brandi & Neyer, 2009). Thus, scenario-based items may be more appropriate than self-reported items, which is an 
issue discussed in the next section. 


Capturing the Interactional Component of Intercultural Competence 

One of the challenges of assessing ICC is that the construct is composed of attitude, knowledge, and skill subdomains 
that require an interpersonal interaction to occur in order to be assessed. As an example, an individual may have to 
realize that he or she is in a situation where cultural differences may be influential, hypothesize how the situation is going 
to unfold, decide how to behave, and take a course of action (Brenneman et al., 2016). Such an interaction is dynamic 
in nature and must be simulated through a scenario. However, building such scenarios requires a heavy expenditure of 
resources, complete with high development costs and overhead. The aforementioned BaFa’ BaFa’ takes about 2 hours for 
20 people to complete, making it a logistical challenge to administer with even the smallest collegiate population. Although 
video- or avatar-based simulations represent one exciting potential alternative to in-person simulations, they, too, require 
a substantial investment of time and money. An additional option could be to use SJTs. This method of assessment has 
been attempted in the Cultural Intelligence Assessment (Thomas et al., 2015), but limited validation evidence prevents 
firm inference on the use of this technique. Moreover, some scholars argue that even a simulated scenario fails to mimic 
the dynamic nature in which ICC is negotiated between two or more parties. In sum, assessing the real-world dynamic of 
ICC is a great challenge that requires creativity, particularly when considering practical constraints, although some recent 
projects are making strong inroads using virtual platforms. 


Inadequate Predictive Validity 

Because ICC is a complex skill, it is sometimes difficult to find an appropriate criterion to evaluate the predictive validity 
of an ICC assessment. As previously discussed, the existing ICC assessments were developed for various purposes; thus, 
the choice of criterion in current validity research varies considerably. The variability of criteria raises a concern regard¬ 
ing the reliability of the criterion measures, given that a poor measure of the criterion may hinder validity evidence. 
Therefore, one challenge is to determine the definition of ICC in higher education and identify acceptable and reliable 
criteria measures to establish predictive validity evidence. One purpose of measuring college students’ ICC as one of their 
learning outcomes is to predict if they are able to effectively communicate and work in an organization with global mis¬ 
sions. At this point, however, it is unclear if such organizations would provide information about their current employees’ 
communication capacity and work efficiency in order to establish evidence of predictive validity. Therefore, given these 
challenges, obtaining criterion measures will be an ongoing process and one that may require longitudinal research to 
establish predictive validity evidence for ICC assessments in higher education. 
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Summary 

These measurement concerns (respondent faking, adequate predictive validity, and incorporation of the interactional and 
culture-general domain without overreliance on specific culture content) challenge those seeking to assess ICC. Further¬ 
more, conceptual concerns regarding existing ICC models also complicate the task. A useful framework for ICC must 
provide specific definitions, clearly delineate between the construct and its nomological network, incorporate both the 
cognitive and noncognitive subdimensions, and clarify the relationships between the subdimensions. Moreover, such a 
framework offers the most utility when constructed to redress the measurement concerns described herein. Based on all 
the above reasons, a new framework designed to overcome both sets of concerns is developed. 

A Proposed Framework for Intercultural Competence in Higher Education 
Operational Definition of Intercultural Competence 

Synthesizing the models from which the reviewed scales were created (e.g., Ang et al., 2007) as well as empirical research 
(e.g., Abbe, Gulick, & Herman, 2007), we propose a framework and operational definition to serve as the basis for the 
development of a new assessment of ICC (Table 4). We propose a new framework here for several reasons. First, many 
existing frameworks do not offer insights on how to translate the theoretical definitions into actual assessments, which 
may have contributed to the difficulty in accumulating validity evidence. The proposed framework aims to provide an 
elaborated discussion of assessment considerations that may better guide the development of an operational assessment. 
Second, academic experts on ICC remain divided, such that many existing models have no widespread support outside 
of their own particular camp of researchers. This tendency is apparent in the trend for ICC validity evidence to be col¬ 
lected primarily by those whose names are attached to the development of the assessment (e.g., Ang et al., 2007). Third, 
developing a new model provides the opportunity to tailor it to the purpose of the assessment and its target population 
(i.e., higher education), focusing on developable skills and excluding components that are less directly related to success¬ 
ful achievement of intercultural goals. More important, generating a new model creates the opportunity to address the 
various concerns regarding construct validity discussed in the previous sections. For example, we theorize that the ability 
to acquire declarative cultural knowledge is less predictive of success than the ability to apply relevant cultural knowledge 
during an intercultural interaction. Thus, we propose the following framework. 

To begin, we draw on a definition from prior research: ICC “reflects a persons capability to gather, interpret, and act 
upon these radically different cues to function effectively across cultural settings or in a multicultural situation” (Earley 
& Peterson, 2004, p. 105). Next, we propose a framework that builds on a process model of social thinking (Grossman, 
Thayer, Shuffler, Burke, & Salas, 2015) by splitting cross-cultural interactions into three stages and specifying the skills 
necessary to support successful performance in each stage. This process model breaks individual behavior in a com¬ 
plex social situation down into four stages (scan, appraise, interpret, and interact) and the cognitive and behavioral skills 
that support them. In this way, the ICC framework is also developed. Intercultural interaction may be conceptualized 
as occurring in three stages: approach, analyze, and act (see Figure 3). These stages act as the dimensions of the frame¬ 
work. The approach dimension includes the characteristics that impact the likelihood that an individual will initiate and 
maintain intercultural contact voluntarily, as well as those traits that will define the overall positivity with which an indi¬ 
vidual responds to cross-cultural interactions. These characteristics include a positive cultural orientation, a tolerance for 
ambiguity, and self-efficacy. The analyze dimension captures an individual’s ability to take in, evaluate, and synthesize rel¬ 
evant information without the bias of preconceived judgments and stereotyped thinking. The analyze dimension includes 
the following traits: self-awareness, social monitoring, perspective taking/suspending judgment, and cultural knowledge 
application. The act dimension incorporates the behaviors determined by the previous dimension to assess individuals’ 
ability to translate thought into action while maintaining control in potentially challenging and stressful situations. The 
act dimension includes behavioral regulation and emotional regulation. The following sections provide more detail about 
the nature of each trait and skill. Operational definitions can be found in Table 4. 

Approach 

As specified above, this dimension includes a positive cultural orientation, tolerance for ambiguity, and cultural self- 
efficacy. Although similar to a general positive attitude toward intercultural situations, a positive cultural orientation is 
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Figure 3 The conceptual model of the approach, analyze, and act intercultural competence framework. 


a consolidated representation of several related concepts in the literature. These concepts include cosmopolitanism (i.e., 
reduced ethnocentrism; Beechler & Javidan, 2007; Levy, Beechler, Taylor, & Boyacigiller, 2007), open-mindedness (Ter¬ 
rell & Rosenbusch, 2013), inquisitiveness (Black, Mobley, & Weldon, 2005), as well as curiosity and respect for other 
cultures (Beechler & Javidan, 2007). Evidence also suggests that such orientations or attitudes can be changed (Ajzen, 
2001). For example, global leadership development programs have been found to foster open-mindedness through par¬ 
ticipants’ genuine curiosity and an attitude of discovery and exploration (Terrell & Rosenbusch, 2013). Therefore, it is 
possible to conclude that positive cultural orientation is not only malleable but could also predict competencies similar 
to ICC, such as intercultural sensitivity and global leadership effectiveness (Cushner, 1986; Terrell & Rosenbusch, 2013). 

The second subdimension of approach, a tolerance for ambiguity, is repeatedly identified as essential to ICC due to 
the inherent nature of interacting with individuals from different cultural backgrounds (e.g., Caligiuri & Tarique, 2012). 
Differences in behaviors, assumptions, communication, and the resulting inability to anticipate potential situations all 
contribute to the ambiguous nature of intercultural interactions (Lane, Maznevski, & Mendenhall, 2004). Individuals 
who can tolerate ambiguity not only function effectively in spite of stress (Caligiuri & Tarique, 2012), but also will be 
less negatively impacted by the stress of the intercultural interaction and more likely to remain engaged and even seek 
out these situations. Therefore, due to the inherent uncertainty associated with cross-cultural interactions, a tolerance for 
ambiguity is an important subdimension of the first dimension in ICC. 

Cultural self-efficacy is the last subdimension of approach. Self-efficacy influences the challenges in which an individual 
chooses to engage and his or her attitude toward those challenges. For example, an individual with high self-efficacy in 
intercultural situations believes that he or she can develop a strong rapport with someone from another culture. Because 
of this perception, the individual is more likely to initiate and engage in interactions that require development of rapport 
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with culturally different others. In this way, an individual’s level of ICC in part depends on the individual’s evaluation of 
his or her own abilities. 

Analyze 

This dimension includes self-awareness, social monitoring, suspending judgment, perspective taking, and cultural knowl¬ 
edge application. Self-awareness requires individuals to consider themselves as both an individual and as a member of their 
own culture. Highly self-aware individuals are capable of dissecting their worldview to identify the influences of their per¬ 
sonal history as separate from the influences of their culture, and they understand that different backgrounds will have 
different worldviews (Reid, Kaloydis, Sudduth, & Greene-Sands, 2012). 

Social monitoring includes the ability to infer social norms, hierarchies, and interpersonal relationship networks (e.g., 
Lodder, Scholte, Goossens, Engels, & Verhagen, 2016). Evidence from neuropsychology suggests that we use social cues, 
such as expressions, as information to evaluate our performance (Boksem, Ruys, & Aarts, 2011). In the absence of famil¬ 
iar norms, then, social monitoring can provide necessary information to supplement missing native knowledge and 
evaluate the success of one’s chosen course of action, making it a necessary skill for engaging in novel cross-cultural 
situations. 

Suspending judgment and perspective taking are two complementary skills that involve processing situational infor¬ 
mation without strong personal bias. An individual who suspends judgment removes his or her stereotyped or heuristic 
thinking; perspective taking replaces these thought patterns with effortful cognitions regarding the other person’s view¬ 
point, motivation, and assumptions. In doing so, individuals reduce their reliance on their own cultural schema in order 
to act on their understanding of a cultural other’s viewpoint. 

Cultural knowledge application requires individuals to consider a broad range of information including culture-general 
information (e.g., cultural value dimensions; Hofstede, 1980), culture-specific information (e.g., French greetings), and 
historical as well as geopolitical information (e.g., the trends of power and privilege; Hammer, 2012). This skill explicitly 
refers to the ability of individuals to actively seek and use cultural information in their evaluation and decision-making 
processes. 

Act 

This dimension includes behavior regulation and emotion regulation. Behavior regulation is essential to ICC because 
behavior patterns considered normal in one culture maybe inappropriate in cross-cultural situations. Individuals skilled 
at behavior regulation would be able to suppress any familiar behaviors inappropriate to the cultural context, generate the 
appropriate behavior for that situation, or perhaps choose not to engage in any behavior at all (e.g., Ang et al., 2007). 

Emotion regulation allows individuals to control which emotions they experience, how and when they experience 
them, and how and when they are expressed (Gross, Salovey, Rosenberg, & Fredrickson, 1998). Because cross-cultural 
experiences are inherently emotional (e.g., Haslberger, Brewster, & Hippier, 2013; Shaffer, 2012), evidence has suggested 
that individuals with strong emotion regulation abilities can act more effectively in cross-cultural situations than those 
without emotion regulation abilities (Haslberger et al., 2013). 

The current framework aims to address the particular construct validity challenges of ICC and the criteria highlighted 
in previous sections (see Validity Evidence Regarding Relationships With Conceptually Related Constructs) First, this 
framework is grounded in a definition of ICC that offers more clarity and distinguishes it from similar constructs, such 
as global leadership. Second, the framework demonstrates comprehensiveness; each subdimension assessment includes 
skills encompassed in other frameworks (e.g., Reid et al., 2012). The framework also expands the comprehensiveness 
of ICC by including cognitive and noncognitive elements. Third, it addresses the need to clarify relationships among 
dimensions. For example, despite strong validity evidence, the equally comprehensive cultural intelligence model (Earley 
& Ang, 2003) lacks theoretical explanations of the interplay between subdimensions. By basing the current model on a 
process model of individual behavior in complex social situations (Grossman et al., 2015), we highlight the dependent 
nature of the dimensions, implying a loose sequential relationship in which success in a later stage is dependent on the 
outcomes of an earlier stage. In sum, the present framework meets the three criteria (definition clarity, comprehensiveness, 
and subdimension relationship clarity) called for in the ICC literature. 
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Table 5 Task Types, Descriptions, and Potential Response Formats 


Task Type 

Description 

Response formats 

Cross-cultural 

Participants will view videos or read about situations and 

Multiple-choice 

scenario-based items 

respond to a series of questions. Questions may range 
from ranking the most useful information provided and 
evaluating appropriate behavioral responses (i.e., cultural 
knowledge application) to indicating likely perspectives 
of the individuals involved in the scenario (i.e., 
perspective taking). 

Short answer 

Likert-type 

Multiple selected-response 

Comma switch 

Individuals must retype a paragraph, swapping out the 
periods and the commas, after baseline typing 
performance has been assessed. 

Text entry 

Likely to be true 

Based on a description of a fictitious character, individuals 
rate the likelihood of statements being true. Statements 
will range from directly related to the information (i.e., 
enjoying similar activities to ones suggested in the profile) 
to more stereotypical statements based on cultural 
membership. 

Multiple-choice 

Short answer 

Likert-type 

Spot the stereotype 

Individuals read a paragraph and must select the sentences 
that are the most based on stereotypes. 

Multiple-choice 

Go/no-go 

Individuals will respond to stimuli by clicking as directed in 
response to two stimuli. 

Text entry 

Flanker 

Individuals will respond to stimuli by clicking as directed in 
response to stimuli. 

Text entry 

Emotional induction 

Participants will be exposed to video clips to alter their 
mood; attitudes or skills could then be reassessed. 

Likert-type 

Short answer 

Troy et al. (2010) 

Participants, prior exposure to a video clip designed to 

Likert-type 

paradigm 

induce sadness, are instructed on an emotional regulation 
strategy. Emotion is measured before and after. 

Short answer 

Incident recollection 

Participants respond to prompts with a short written answer 
that is accessed using key word counts. 

Short answer 

Coaching task 

Participants will be asked to resolve the cross-cultural 
difficulty or conflict experienced by a friend. 

Selected-response 

Multiple selected-response 
(chat/nonchat based) 

BASIC prompts 

Individuals will respond to a variety of prompts, including 
statements (i.e., self-report items) and conditional 
reasoning questions. 

Multiple-choice 

Forced-choice 


Task Types and Response Formats 

In crafting an ICC framework that entails assessing attitudes, cognitions, and behaviors, a complex assessment strategy 
will be necessary to adequately capture the content of each component. For that reason, a range of assessment considera¬ 
tions is presented in the following section, including task type and response option formats. Task type refers specifically 
to the type of activity, question, or prompt with which examinees would interact. Examples of these include SJTs or emo¬ 
tional induction. Response format refers to the format through which the response is communicated, such as short answer 
or multiple-choice. It should be noted that the tasks that we propose are not limited strictly to intercultural interactions, 
especially in the approach stage, as subdimensions such as tolerance of ambiguity are relevant in many situations in addi¬ 
tion to intercultural interactions. However, when specifically measuring the ICC construct, tasks will explicitly reference 
elements of culture to best tap that domain. Table 5 contains an overview of the different task types and their potential 
response formats. Table 6 relates task type to the constructs of the present ICC model. 

The next generation of ICC assessment requires more variety in task type. Historically, ICC has typically been assessed 
with self-report questions, in which the respondents report their own abilities, skill level, attitude, or knowledge. As dis¬ 
cussed above, these commonly used self-report items maybe appropriate for attitudinal constructs, but maybe less so for 
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cognitive and behavioral skills. Considering the commonality of self-report items, assessment considerations are focused 
more heavily on these cognitive and behavioral dimensions. To that end, the following section discusses several task types 
and their associated response formats. 

Intercultural Scenario-Based Items 

Intercultural scenario-based (ICSB) items can be used to assess the appropriate behavioral response to a cross-cultural 
situation. ICSB items can be employed in the current context to focus on the specific skills of the framework, such as 
those in the analyze dimension. Potential questions in response to a situational passage or video could include those listed 
below. See Table 6 for a full list of the dimensions that could use the following item format: 

1. What is the motivation of the first speaker? (perspective taking) 

2. What additional information about the first speaker’s culture would help you determine how to act? (cultural knowl¬ 
edge application) 

3. Which of the following claims about the first speaker is likely to be true? (suspending stereotyped thinking) 

Following the test or video that serves as the prompt for ICSB items, participants may be asked to respond using 
multiple-choice, Likert-type item, or short answer, each of which have strengths and weaknesses as response formats. 
Multiple-choice items allow multiple incorrect distractor options to be presented to the examinee, creating additional 
challenges in determining the correct answer. Likert-type items capture attitudinal constructs such as tolerance for 
ambiguity, as well as an individual’s perceptions of their own abilities and their current emotional state in response to 
the situation. Short answer replies to open-ended questions allow for the most complex and qualitatively rich responses, 
in which participants generate their own unique responses. Finally, multiple items can address a single ICSB prompt, 
and different response formats could be used in conjunction with one another. It is important to note, however, that 
although the short answer response option might capture additional variance, items using this response option are 
resource intensive. They require the development of rubrics and two or more individuals to score written responses. 
However, advanced word recognition technology or other automated scoring procedures may remove the necessity of 
human scoring after the automated models have been validated. Although the technological development might require 
upfront resources, this could potentially decrease the cost of administering the assessment and the time required to 
score it. 

One novel response format that might be used with ICSB task type involves the use of multiple selected responses. In 
other words, an examinee would be asked to select from two or more lists of options that explain their thinking or choices. 
For example, in response to a scenario, a participant could be asked to formulate an answer using three drop-down menus: 
one to indicate how he or she would feel in response to that scenario, a second to indicate what he or she would do, and 
a third to provide an explanation of choice. This method captures more information per scenario and allows participants 
to more precisely describe how they would respond to a situation. Moreover, it offers the potential to elicit more in-depth 
information from respondents without having to use constructed-response items that necessitate human scoring. The 
multiple drop-down menus can also be used in ICSB items to measure emotion regulation, a key component of the act 
stage. For example, in response to a scenario, participants can be asked how they would feel and what they would do to 
in response to those feelings. However, it should be noted that research on this response format may be less familiar to 
participants (Heerwegh & Loosveldt, 2002) and suffer from order effects (i.e., response options being selected based on 
place in the list; Couper, Tourangeau, Conrad, & Crawford, 2004). 

Nontraditional Behavioral Skills Tests 

Nontraditional behavioral skills tests (Gabrenya et al., 2012) represent another set of task types. Behavioral competen¬ 
cies such as flexibility, a key component of the act stage, may be captured by tasks such as those comprised by the Test 
of Attentional Performance battery (Zimmermann & Fimm, 2002). One of those tasks is the go/no-go task that requires 
participants to inhibit a response triggered by external stimuli. For example, an examinee may be asked to respond to go 
stimuli (e.g., a square in her screen) by pressing the space bar but refrain from pressing the key when she sees a circle 
(i.e., the no-go stimulus); the number of squares will far outweigh the number of circles, especially in the beginning, mak¬ 
ing pressing the space bar the dominant response. An individual’s ability to withhold responding to the no-go stimulus, 
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assessed by the number of incorrect keystrokes (the number of space bar presses after seeing a circle), is used to assess 
behavioral inhibition (Simmonds, Pekar, & Mostofksy, 2008). Performance on this task may capture an important ele¬ 
ment of ICC: inhibiting the cultural response patterns from one’s own culture and engaging in the norms of one’s host 
culture. Go would be an appropriate option for the behavior regulation subdimension of the current model’s act element. 
Additionally, several variants of this task exist (e.g., the Flanker task, which uses arrow keys; Koban & Pourtois, 2014). 
This range would allow for more variety in the task types presented to assessment takers. Participant reactions could also 
be captured as a way of assessing tolerance for ambiguity. Delays in response time after errors could also be captured as a 
way of measuring reaction to errors (Koban & Pourtois, 2014). In the context of ICC, higher sensitivity to error informa¬ 
tion could provide increased success. Concerns over lack of thematic continuity with the rest of the assessment could be 
addressed by embedding the basic task into a game set in against a fictitious cultural backdrop. 

Nontraditional behavioral skills prompts would use text entry as a response format. This response format can capture 
behavioral responses; comparable to LATs that monitor speed and keyboard input, text entry could produce a skill-level 
score based on speed and incorrect keystroke. However, although this item format might be ideal for assessing the more 
difficult-to-capture skill dimensions (i.e., behavior regulation), it requires significant investment in development and pilot 
testing. Moreover, due to the novel nature of the examinee performance data generated by this response option, it is likely 
that normative performance data would be required to develop scoring guidelines. These items might also impose higher 
technological requirements on participants, both in terms of knowledge (i.e., computing ability) and equipment (i.e., more 
recent computers and faster internet connections). Finally, these approaches may be perceived to be unrelated to ICC by 
respondents due to salient differences in face validity. 

Troy ef ai. Paradigm 

Emotion regulation, the other subdimension of act, might also be measured in a nontraditional fashion using a recently 
developed paradigm (Troy, Wilhelm, Shallcross, & Mauss, 2010). The Troy et al. paradigm involves inducing a negative 
emotion in participants over a series of trials to assess emotion regulation skills. For the first induction, individuals view 
a video designed to trigger the desired emotion with no instructions; this trial serves as a baseline of emotional reactivity. 
Over subsequent inductions, individuals are given specific instructions to use a particular emotion regulation strategy 
(cognitive reframing: asking participants to think about the positive elements). The difference in reported emotion, as 
assessed by Likert-type items, is then used as a measure of emotion regulation ability. Results from Troy et al. (2010) 
suggest that it is a valid method (Gabrenya et al., 2012). Participants engaging in the emotion regulation strategy experi¬ 
enced less sadness than those who were given no instructions. To increase the thematic continuity of the assessment, the 
emotion-generating stimuli could be cross-cultural in nature (e.g., a filmed confrontation around cultural differences). 

Response formats for the paradigm of Troy et al. (2010) include Likert-type and forced-choice items. Likert-type items 
offer the flexibility to assess a single emotion, but forced-choice items are by necessity comparative. In other words, forced- 
choice items would require creating potential response options that are of equal valence. If the aim of the task is only to 
assess sadness, than forced-choice items might be difficult to generate. 

Conditional Reasoning 

Conditional reasoning items represent another potential task type to assess ICC. Conditional reasoning items are designed 
to tap the unconscious and implicit elements of attitudes, and as such, are a good option when socially desirable responding 
is a concern. They examine cognitive biases under the pretense of an inductive reasoning exam. The respondent is pre¬ 
sented with a scenario or choice of some sort and asked to pick from several response options that include a reason. Condi¬ 
tional reasoning items disguise the “right” answer—the options would include logic that appeals to the cognitive schema 
of individuals at all levels of the construct. For example, a conditional reasoning test item related to positive cultural atti¬ 
tude, an approach subdimension, could ask the examinee to select the reason for the increase in American car quality over 
the past 15 years after the introduction of foreign cars to American markets. Two of the options are as follows: “American 
companies have learned a lot from their international counterparts about quality manufacturing” and “American car man¬ 
ufacturers rose to the challenge in order to drive away foreign competition.” To endorse the former option, an individual 
makes a cooperative assumption, but an individual endorsing the latter option expresses a more hostile and competitive 
option. A complete conditional reasoning test would score an individual’s latent level of the construct based on the number 
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of times they endorsed the less positive options (C. M. Berry, Sackett, & Tobares, 2010). For measures of ICC attempting to 
assess general favorable attitudes toward culturally distinct others — essentially the inverse of ethnocentrism — the trans¬ 
parency of self-report items may preclude much variance. Beyond attitudes in the approach stage, these items might also 
be used to test the cognitive skills of the analyze stage as a standardized cognitive path analysis, in which individuals are 
asked to describe which way of knowing is closest to how they arrived at an answer. For example, response options would 
contain a clause that addresses the reasoning that supports the correct option. In other words, responses to an item could all 
describe the same behavioral response to the situation but have a different explanation for why that behavior was correct. 
Initial evidence suggests that these items reduce faking (LeBreton, Barksdale, Robin, & James, 2007); however, conditional 
reasoning items require extensive development efforts and pilot testing, making them a high-investment option. 

The response format for conditional reasoning prompts could be a form of multiple choice that resembles the 
forced-choice response format. Each option presents an inference in reference to the prompt; two of the options contain 
framework-inconsistent inferences and serve only as distractors, one option reflects high levels of the target construct, 
and the fourth, low levels. The latter two response options are engineered to appeal or seem intuitive to an individual 
who has a high or low standing on that construct, respectively. An examinee must select one explanation to stand in for 
his or her reasoning in order to complete the task. Evidence supports this particular brand of multiple choice as being 
resistant to intentional faking (LeBreton et al., 2007). 

Incident Recollection 

Autobiographical incident recollection via advanced word recognition software or machine learning via keyword search 
can capture a variety of subdimensions. Individuals could be prompted to write short paragraphs about previous success¬ 
ful and unsuccessful cross-cultural experiences, or even theorize about what makes cross-cultural experiences successful, 
after which the automated scoring algorithm would look for keywords, phrases, and synonyms consistent with the pro¬ 
posed framework. Essay scoring options vary. For example, a score can be developed based on a frequency count of words 
related to specific skills (i.e., an analyze score created in part by the use of the words viewpoint, perspective, what they were 
thinking, how they might consider it, or in their shoes). An attitudinal score could be produced based on the overall valence 
(positivity-negativity) of the word choice. When paired with SJT stimuli, scoring the natural language of the respondent 
may be a productive method to assess whether their thought patterns map on to language consistent (or inconsistent, in 
the case of negative scoring) with the targeted constructs. This task type would rely primarily on the short answer response 
format, the benefits and drawbacks of which were previously discussed. Most notably, the short answer format is highly 
susceptible to faking, as participants could generate completely fictional accounts. 

Coaching Task 

For some testing situations, engendering specific emotions in the examinees may be considered inadvisable, especially 
negative emotions. In such cases, the following coaching paradigm might be used instead to test emotional regulation, the 
second subdimension of the act stage. Similar to ICSB items, these would describe a cultural situation in which a friend 
has experienced a negative situation, accompanied by a picture or short GIF when not video based. The correct answer 
would be a plausible response to the situation in combination with an emotion regulation strategy. Distractor options 
would include plausible responses that did not resolve the negative emotion expressed by the friend. Over several such 
items, it will be possible to assess an examinee’s inclination toward emotion regulation. Although assessing this inclination 
is not the same as measuring an ability, it does provide the proxy measure intention, which has been shown to predict 
behavior (e.g., Ajzen, 1991). 

This item type could, like conditional reasoning items, use the forced-choice response option. However, it could 
also use more novel and interactive response formats, in particular a chat-based selected-response format. This format 
would mimic a chat room environment but use a computer-directed avatar rather than a human-in-the-loop. Using 
computer-generated responses would reduce the cost while still creating an interactive examinee experience. However, 
developing items that use this format would require resource-intense investment initially. Such a format would facilitate 
a conversational tone. Participants could provide their advice and then be asked why they selected that advice option, 
providing an increased number of response combinations without necessitating an overwhelming number of response 
options within a single response list. 
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Additional Testing Considerations 
Increased Psychological Fidelity 

The assessment could also be adapted to replicate the cognitive and emotional complexity of real cross-cultural situations, 
a condition known as psychological fidelity. The inclusion of additional stimuli acknowledges the cognitive and emotional 
load present in cross-cultural interactions, which can be complex and challenging (Gabrenya et al., 2012). These stimuli 
could include foreign music (as a distraction), interrupting or competing tasks (increased cognitive load), or even minor 
emotional distress (e.g., a bad mood). This strategy would allow measurement conditions to more accurately reflect the 
conditions under which the skills assessed are used in reality and improve the assessment’s ability to predict outcomes. 
They may also allow for the use of repeated measurement to tap other skills. For example, individuals could be asked to 
go through multiple rounds of the go/no-go task, with a negative mood induced in between rounds. Emotion regulation 
(part of act), could be assessed by the increase in errors in the second round. 

Accessibility 

In line with the best practices for testing established by the Standards for Educational and Psychological Testing, a next- 
generation assessment should be designed to “facilitate accessibility and minimize construct-irrelevant barriers for all test 
takers in the target population, as far as possible” (AERA et al., 2014, p. 57). The target population for this next-generation 
measure of ICC, American-based higher education students, is a diverse one; many universities have made great strides 
in accessibility for students with disabilities, funding for disadvantaged students, and attracting international students. 
Thus, a universal design (the principle of design in which products and environments are created to the maximal extent 
to be usable to everyone without needing case-by-case adaptation; Measured Progress & ETS Collaborative, 2012) should 
be considered. In short, as items are being crafted, test developers should aim to include aids and other considerations 
for examinees with differing abilities, language and cultural backgrounds, socioeconomic status, genders, and ages. For 
example, if the cultural scenarios are text-based prompts, reading level and working memory differences may impact 
examinees’ scores. The use of visual aids such as charts and pictures may be incorporated to offset these demands and 
serve as memory cues, should video-based vignettes prove infeasible. These graphics could then also be accompanied 
by written descriptions for students with visual impairment. Additionally, efforts should be made to reduce the use of 
idiomatic language, which can serve as a barrier for examinees who speak English as a second language (Sireci, 2011). 
Further, some item types, such as the go/no-go task, require significant bandwidth and computational processing speed, 
and examinees’ test-taking experience may then be adversely impacted by their lack of access to high-quality technology. 
The assessment could collect a baseline measurement by launching with a series of nonscored practice rounds so that 
technological differences might be taken into account for scoring purposes; a practice version would also serve as a tutorial 
to provide additional comfort to examinees with less exposure to such technology. 

Conclusion 

ICC has been identified as a critical life skill likely to predict success in the 21st century workforce. As universities begin 
to explore expanding traditional models of learning outcomes and emphasize these life skills, there is a need to assess 
whether students possess these critical competencies. In addition, assessments are needed to determine whether the abil¬ 
ities and skills underlying ICC improve during the university tenure of the student. Unfortunately, the current state of 
measurement of ICC leaves much to be desired, for several reasons. First, little consensus seems to exist regarding the 
requisite skills and abilities that contribute to ICC. Second, the measurement of ICC has overrelied on self-report meth¬ 
ods that do not adequately cover the entire spectrum of the construct. Specifically, existing measures often tap self-referent 
cognitions without adequately capturing the affective and behavioral aspects that are inherent in intercultural interactions. 
Finally, the psychometric properties of existing measures leave much room for improvement. Although the reliabilities 
of existing measures meet professional standards, a relatively small number of studies provide evidence relating scores to 
other constructs, and even fewer provide evidence that the measures are related to outcomes of interest. 

The three-pronged framework provided in this paper, approach, analyze, and act, is broad enough to cover important 
ICC construct domains, but also specific enough to result in clear operational definitions that can be used to guide the 
design of an ICC assessment. First, the framework assumes that ICC is an interactive process rather than treating the 
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construct as static. Second, the proposed framework follows this process through attitudinal, cognitive, and behavioral 
interactions that would likely occur in social cross-cultural communications. Finally, the framework is presented in a 
parsimonious fashion that enables clear interpretation of data that may result from a measure developed based on the 
framework. In addition to proposing a new framework, we deliberated on more innovative and interactive methods of 
assessing ICC that go beyond self-report. These methods have potential to improve the measurement of what has been an 
elusive construct, as well as to make the assessment experience enjoyable and insightful for students. It is our hope that 
the work presented in this paper will spur further discussion and examination of the ICC construct. In addition, we hope 
this continued discourse ultimately results in an operational measure of ICC that can assist higher education institutions 
in preparing a new generation of culturally competent global citizens. 
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Appendix 

Literature Search Strategy 

In order to conduct a comprehensive review of the literature, an iterative search process was implemented. As of now, 
the EBSCO database and Google Scholar were the primary databases used to obtain relevant articles. Keywords used in 
this search included intercultural competence, cross-cultural competence, college students, university students, postsecondary 
education, higher education. 
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